衡量图像的相似性是计算机视觉的基本问题,不存在通用解决方案。尽管已显示出像素的L2-Norm这样的简单指标,例如L2-Norm具有很大的缺陷,但它们仍然受欢迎。一组最新的最新指标减轻了其中一些缺陷是深度的知觉相似性(DPS)指标,其中将相似性评估为神经网络深度特征的距离。但是,DPS指标本身还没有彻底检查其利益,尤其是其缺陷。这项工作研究了最常见的DPS度量,其中通过空间位置进行了比较的深度特征,并比较了平均和排序的深度特征。对指标进行了深入分析,以通过使用专门挑战它们的图像来了解指标的优势和劣势。这项工作为DPS的缺陷提供了新的见解,并进一步提出了对指标的改进。这项工作的实施可在线获得:https://github.com/guspih/deep_perceptual_similarity_analysis/
translated by 谷歌翻译
这项工作提出了一种新型的自我监督的预训练方法,以学习有效的表示,而没有在组织病理学医学图像上使用放大倍率的因素进行标签。其他最先进的工作主要集中在完全监督的学习方法上,这些学习方法严重依赖人类注释。但是,标记和未标记数据的稀缺性是组织病理学的长期挑战。当前,没有标签的表示学习仍未探索组织病理学领域。提出的方法是放大事先的对比相似性(MPC),可以通过利用放大倍率,电感转移和减少人类先验的宽度乳腺癌数据集中的无标签来进行自我监督的学习。当仅20%的标签用于微调和表现以前的工作中,在完全监督的学习环境中,该方法与恶性分类的最新学习相匹配。它提出了一个假设,并提供了经验证据来支持,从而减少人类优先导致自学​​中有效表示学习。这项工作的实施可在github-https://github.com/prakashchhipa/magnification-prior-self-supervised-method上在线获得。
translated by 谷歌翻译
在视频中自动识别有害内容是一项重要的任务,具有广泛的应用程序。但是,缺乏可用的专业标签开放数据集。在这项工作中,介绍了由专业人士注释的电影预告片的3589个视频片段的开放数据集。对数据集进行了分析,从而揭示了剪辑和拖车级别注释之间的关系。视听模型在数据集上进行了培训,并对进行的建模选择进行了深入研究。结果表明,通过结合视觉和音频方式,大规模视频识别数据集的预训练以及类平衡采样来大大提高性能。最后,使用歧视探测研究了受过训练的模型的偏差。Vidharm公开可用,并提供更多详细信息,请访问:https://vidharm.github.io。
translated by 谷歌翻译
Multimodal deep learning has been used to predict clinical endpoints and diagnoses from clinical routine data. However, these models suffer from scaling issues: they have to learn pairwise interactions between each piece of information in each data type, thereby escalating model complexity beyond manageable scales. This has so far precluded a widespread use of multimodal deep learning. Here, we present a new technical approach of "learnable synergies", in which the model only selects relevant interactions between data modalities and keeps an "internal memory" of relevant data. Our approach is easily scalable and naturally adapts to multimodal data inputs from clinical routine. We demonstrate this approach on three large multimodal datasets from radiology and ophthalmology and show that it outperforms state-of-the-art models in clinically relevant diagnosis tasks. Our new approach is transferable and will allow the application of multimodal deep learning to a broad set of clinically relevant problems.
translated by 谷歌翻译
Our goal with this survey is to provide an overview of the state of the art deep learning technologies for face generation and editing. We will cover popular latest architectures and discuss key ideas that make them work, such as inversion, latent representation, loss functions, training procedures, editing methods, and cross domain style transfer. We particularly focus on GAN-based architectures that have culminated in the StyleGAN approaches, which allow generation of high-quality face images and offer rich interfaces for controllable semantics editing and preserving photo quality. We aim to provide an entry point into the field for readers that have basic knowledge about the field of deep learning and are looking for an accessible introduction and overview.
translated by 谷歌翻译
The success of Deep Learning applications critically depends on the quality and scale of the underlying training data. Generative adversarial networks (GANs) can generate arbitrary large datasets, but diversity and fidelity are limited, which has recently been addressed by denoising diffusion probabilistic models (DDPMs) whose superiority has been demonstrated on natural images. In this study, we propose Medfusion, a conditional latent DDPM for medical images. We compare our DDPM-based model against GAN-based models, which constitute the current state-of-the-art in the medical domain. Medfusion was trained and compared with (i) StyleGan-3 on n=101,442 images from the AIROGS challenge dataset to generate fundoscopies with and without glaucoma, (ii) ProGAN on n=191,027 from the CheXpert dataset to generate radiographs with and without cardiomegaly and (iii) wGAN on n=19,557 images from the CRCMS dataset to generate histopathological images with and without microsatellite stability. In the AIROGS, CRMCS, and CheXpert datasets, Medfusion achieved lower (=better) FID than the GANs (11.63 versus 20.43, 30.03 versus 49.26, and 17.28 versus 84.31). Also, fidelity (precision) and diversity (recall) were higher (=better) for Medfusion in all three datasets. Our study shows that DDPM are a superior alternative to GANs for image synthesis in the medical domain.
translated by 谷歌翻译
Partitioning an image into superpixels based on the similarity of pixels with respect to features such as colour or spatial location can significantly reduce data complexity and improve subsequent image processing tasks. Initial algorithms for unsupervised superpixel generation solely relied on local cues without prioritizing significant edges over arbitrary ones. On the other hand, more recent methods based on unsupervised deep learning either fail to properly address the trade-off between superpixel edge adherence and compactness or lack control over the generated number of superpixels. By using random images with strong spatial correlation as input, \ie, blurred noise images, in a non-convolutional image decoder we can reduce the expected number of contrasts and enforce smooth, connected edges in the reconstructed image. We generate edge-sparse pixel embeddings by encoding additional spatial information into the piece-wise smooth activation maps from the decoder's last hidden layer and use a standard clustering algorithm to extract high quality superpixels. Our proposed method reaches state-of-the-art performance on the BSDS500, PASCAL-Context and a microscopy dataset.
translated by 谷歌翻译
Recent advances in computer vision have shown promising results in image generation. Diffusion probabilistic models in particular have generated realistic images from textual input, as demonstrated by DALL-E 2, Imagen and Stable Diffusion. However, their use in medicine, where image data typically comprises three-dimensional volumes, has not been systematically evaluated. Synthetic images may play a crucial role in privacy preserving artificial intelligence and can also be used to augment small datasets. Here we show that diffusion probabilistic models can synthesize high quality medical imaging data, which we show for Magnetic Resonance Images (MRI) and Computed Tomography (CT) images. We provide quantitative measurements of their performance through a reader study with two medical experts who rated the quality of the synthesized images in three categories: Realistic image appearance, anatomical correctness and consistency between slices. Furthermore, we demonstrate that synthetic images can be used in a self-supervised pre-training and improve the performance of breast segmentation models when data is scarce (dice score 0.91 vs. 0.95 without vs. with synthetic data).
translated by 谷歌翻译
自动预测主观听力测试的结果是一项具有挑战性的任务。即使听众之间的偏好是一致的,评分也可能因人而异。虽然先前的工作重点是预测单个刺激的听众评分(平均意见分数),但我们专注于预测主观偏好的更简单任务,即给出了两个语音刺激的同一文本。我们提出了一个基于抗对称双神经网络的模型,该模型是在波形对及其相应偏好分数上训练的。我们探索了注意力和复发性神经网,以说明一对刺激不符合时间的事实。为了获得大型训练集,我们将听众的评分从Mushra测试转换为反映这对中一种刺激的频率高于另一个刺激的值。具体而言,我们评估了从五年内进行的十二个Mushra评估获得的数据,其中包含不同扬声器数据的不同TTS系统。我们的结果与经过预测MOS得分的最先进模型相比有利。
translated by 谷歌翻译
可靠的自动可读性评估方法有可能影响各种领域,从机器翻译到自我信息学习。最近,用于德语语言的大型语言模型(例如Gbert和GPT-2-Wechsel)已获得,从而可以开发基于深度学习的方法,有望进一步改善自动可读性评估。在这项贡献中,我们研究了精细调整Gbert和GPT-2-Wechsel模型的合奏能够可靠地预测德国句子的可读性的能力。我们将这些模型与语言特征相结合,并研究了预测性能对整体大小和组成的依赖性。 Gbert和GPT-2-Wechsel的混合合奏表现要比仅由Gbert或GPT-2-Wechsel模型组成的相同大小的合奏表现更好。我们的模型在2022年的Germeval 2022中进行了评估,该任务是关于德国句子数据的文本复杂性评估。在样本外数据上,我们的最佳合奏达到了均方根误差为0.435。
translated by 谷歌翻译